Exploring global diverse attention via pairwise temporal relation for video summarization

نویسندگان

چکیده

Abstract Video summarization is an effective way to facilitate video searching and browsing. Most of existing systems employ encoder-decoder based recurrent neural networks, which fail explicitly diversify the system-generated summary frames while requiring intensive computations. In this paper, we propose efficient convolutional network architecture for SUMmarization via Global Diverse Attention called SUM-GDA, adapts attention mechanism in a global perspective consider pairwise temporal relations frames. Particularly, GDA module has two advantages: (1) it models within paired as well among all pairs, thus capturing across one video; (2) reflects importance each frame whole video, leading diverse on these Thus, SUM-GDA beneficial generating form satisfactory summary. Extensive experiments three data sets, i.e., SumMe, TVSum, VTW, have demonstrated that its extension outperform other competing state-of-the-art methods with remarkable improvements. addition, proposed can be run parallel significantly less computational costs, helps deployment highly demanding applications.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic video summarization driven by a spatio-temporal attention model

According to the literature, automatic video summarization techniques can be classified in two parts, following the output nature: “video skims”, which are generated using portions of the original video and “key-frame sets”, which correspond to the images, selected from the original video, having a significant semantic content. The difference between these two categories is reduced when we cons...

متن کامل

Diverse Sequential Subset Selection for Supervised Video Summarization

Video summarization is a challenging problem with great application potential. Whereas prior approaches, largely unsupervised in nature, focus on sampling useful frames and assembling them as summaries, we consider video summarization as a supervised subset selection problem. Our idea is to teach the system to learn from human-created summaries how to select informative and diverse subsets, so ...

متن کامل

Video Saliency Detection via Dynamic Consistent Spatio-Temporal Attention Modelling

Human vision system actively seeks salient regions and movements in video sequences to reduce the search effort. Modeling computational visual saliency map provides important information for semantic understanding in many real world applications. In this paper, we propose a novel video saliency detection model for detecting the attended regions that correspond to both interesting objects and do...

متن کامل

Perception-oriented video saliency detection via spatio-temporal attention analysis

Human visual system actively seeks salient regions and movements in video sequences to reduce the search effort. Computational visual saliency detection model provides important information for semantic understanding in many real world applications. In this paper, we propose a novel perception-oriented video saliency detection model to detect the attended regions for both interesting objects an...

متن کامل

Video Question Answering via Hierarchical Spatio-Temporal Attention Networks

Open-ended video question answering is a challenging problem in visual information retrieval, which automatically generates the natural language answer from the referenced video content according to the question. However, the existing visual question answering works only focus on the static image, which may be ineffectively applied to video question answering due to the lack of modeling the tem...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Pattern Recognition

سال: 2021

ISSN: ['1873-5142', '0031-3203']

DOI: https://doi.org/10.1016/j.patcog.2020.107677